Public: Concord Software Projects : OTrunk Jackrabbit Evaluation

This page last changed on Jul 23, 2008 by scytacki.

Overview

We are looking at using jackrabbit to help solve a few different issues:

automatic incremental upload and download of learner data.
on demand syncing of subsets of the learner data
report query performance
replication

Automatic incremental upload and download of learner data

Benefits

This would support new features

near realtime collaboration
near realtime reporting
more learner data because the upload is spread out over time

This would make the system more robust:

more data would be captures in the case of computer failure
with a spread out upload the upload at the end of the session would be unnoticed.
network failures can be detected in near real time.

Requirements

http based protocol must be used so this works well through school firewalls
should not impact the performance of the client software.
client components do not have to know about remote persistence.
in the case of a failure the set of objects on the server should be internally consistent so they can be reloaded at a later time.
if the network goes down the learner data should still be saved to a local disk a regular intervals in case the program crashes.

Basic design

At regular intervals send all changes up to the server which haven't been sent before. This should be done asynchronously, so the learner doesn't have to wait for this operation to complete.

Jackrabbit supports

Jackrabbit has beta support for remoting jcr over webdav. This is done through the SPI stack.
One way to use this is:
client -> otrunk-jackrabbit -> jcr -> jcr2spi -> spi2dav -> internet -> jcr-server -> jackrabbit repository

JCR uses the concept of transient objects. So code using JCR can create a set of transient objects and then only store them when they are complete. This is done by calling save on a object or on the entire session. The jcr2spi -> spi2dav layer sends the objects over the network when save is called.

It is not clear in the jcr2spi -> spi2dav how the JCR read methods work. Do they update all the time, or do they only update on changes, or do they only update once. I saw tests for what looked like different versions of this but haven't explored it yet.

Options and analysis

Using the basic stack described above cannot meet the requirements. This is for performance and consitency reasons. Instead a local copy of each object needs to be maintained and then these are asynchronously written into the jcr nodes and then saved.

Basic Stack Call save at regular intervals

This would result in all the unsaved transient objects to be saved over the network. This doesn't work because save is not thread safe so all write calls need to stop during the save. If a user is in the middle of drawing, or writing they would have to wait for this to complete. If the save is done in another thread then the stored objects might not be internally consistent, and some object might be marked saved when really they have been modified.

Basic Stack Call save as each change is made

This would result in consistent data, but the performance is too slow since each key stroke or object movement has to wait for the save to complete.

Local copy

Implementing this seems like the only option. It can be done a couple of ways:

using a local jackrabbit repository that is synchronized/replicated with a remote repository.
using a local OTrunk Database that is synchronized/replicated with jcr objects.

The first option would solve two problems at the same time, but it might be too heavy weight. The second option is lighter weight but would duplicate synchronization/replication effort

One difficult design choice is how to deal with reading the data:

Should all read calls pull down information from the repository.
should only the first read call pull down information and and later calls use that info, this could be combined with a refresh/reload option.

Amount of effort for number 1
This depends on the existing synchronization/replication options available for jackrabbit. If none of them would work, then it might be possible to reuse parts of the "update" and "merge" code that is already part of jackrabbit which handles synchronizing 2 workspaces within a single repository. If versioning is ignored then this should be pretty straight forward so I'd estimate 2 weeks. If versioning is added in it gets more complex so then I'd estimate a month.

Amount of effort for number 2
There is no built in support for versioning, so that could be ignored in this case.

Report query performance

Replication

Document generated by Confluence on Jan 27, 2014 16:52